Analysis method, analysis apparatus, and computer-readable recording medium storing analysis program

ABSTRACT

Normal and abnormal states are calculated from log data with respect to each of a plurality of processings in which shared modules exist. A timing of a change of the states is calculated. A time interval, in which the normal and abnormal states are not mixed, is separated with respect to each of the plurality of processings, based on the calculated timing. In the time interval, an abnormal module is detected, based on relationship information between the plurality of processings and the modules.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is based upon and claims the benefit of priority of theprior Japanese Patent Application No. 2013-000705, filed on Jan. 7,2013, the entire contents of which are incorporated herein by reference.

FIELD

The embodiments discussed herein are related to an analysis method, ananalysis apparatus, and a computer-readable recording medium storing ananalysis program.

BACKGROUND

In an application program, a network service, or the like, attempts havebeen made to find a delay point or an abnormal point (see, for example,the following Patent Documents 1 and 2).

Generally, in order to find the delay point or the abnormal point, it isnecessary to collect a log before and after the point and continue tomonitor a state of the point. For example, in the case of a processingsequence of start-A-B-C-D-end, a delay of each processing of A to D canbe found by collecting time-stamped logs immediately before A, in themiddle of A-B, in the middle of B-C, in the middle of C-D, andimmediately after D. For example, in a case where B is delayed, thedelay can be found using a log between A and B (immediately before B)and a log between B and C (immediately after B)=a log before and afterB.

On the other hand, in order to find an abnormal point in an applicationprogram or a network component, it is necessary to collect a largenumber of logs at a plurality of monitoring points.

For this reason, narrowing and identifying an abnormal point would causea considerable execution overhead and network load.

-   [Patent Document 1] Japanese Patent Publication No. 2011-211295-   [Patent Document 2] Japanese Patent Publication No. 2006-222808

In such an application program or network component, processing using acommon module is performed in a plurality of processings.

A delay of a particular module causes a delay in a plurality of relevantprocessings. An identification of a module being delayed may beperformed by, for example, taking an average of a response time atregular time intervals, comparing the average time with a normal orabnormal threshold value, and classifying normally operating processingand delayed processing based on the comparison result.

However, depending on a method of taking a length, a timing, or the likeof an interval during which processing time is averaged, appropriatedata may not be obtained, for example, processing to be originallydiagnosed as abnormal may be classified as normal processing, orinformation necessary for identifying a problem may not be obtained.

SUMMARY

An aspect of an analysis method includes: calculating normal andabnormal states from log data with respect to each of a plurality ofprocessings in which shared modules exist; calculating a timing of achange of the states; separating a time interval, in which the normaland abnormal states are not mixed, with respect to each of the pluralityof processings, based on the calculated timing; and detecting anabnormal module in the time interval, based on relationship informationbetween the plurality of processings and the modules.

The object and advantages of the invention will be realized and attainedby means of the elements and combinations particularly pointed out inthe claims.

It is to be understood that both the foregoing general description andthe following detailed description are exemplary and explanatory and arenot restrictive of the invention

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1 is a block diagram illustrating an example of a network systemaccording to an embodiment;

FIG. 2 is a diagram illustrating an example of a relationship betweenfunctions and components according to an embodiment;

FIGS. 3A to 3D are diagrams illustrating an example of matrixexpressions of the relationship between the functions and the componentsaccording to the embodiment;

FIG. 4 is a flowchart describing an example of an operation of ananalysis phase according to an embodiment;

FIG. 5 is a flowchart describing an example of an operation of anoperation phase according to an embodiment;

FIG. 6 is a flowchart describing an example of association processingillustrated in FIG. 4;

FIG. 7 is a diagram schematically describing an example of theassociation processing illustrated in FIG. 4;

FIG. 8 is a diagram illustrating an example of an analysis resultnotification window of the operation phase according to an embodiment;

FIG. 9 is a diagram illustrating an example of a notification windowwhen the time of delay is detected according to an embodiment;

FIG. 10 is a diagram schematically illustrating a state in which normaland abnormal data are mixed in an aggregation interval of each functionaccording to an embodiment;

FIG. 11 is a diagram schematically describing an example of a problemwhen the aggregation interval is minimized in FIG. 10;

FIG. 12 is a diagram schematically describing a state in which a normalinterval and an abnormal interval in FIG. 10 are divided and determinedin a superimposed manner;

FIG. 13 is a diagram describing a case example of a business processingsystem according to an embodiment;

FIG. 14 is a diagram schematically describing an example of an abnormaldevelopment in the business processing system illustrated in FIG. 13;

FIG. 15 is a diagram schematically describing a case where the analysismethod according to the embodiment is applied to the business processingsystem;

FIG. 16 is a flowchart describing advance preparation processingaccording to an embodiment;

FIG. 17 is a flowchart describing an example of an operation in anoperation phase according to an embodiment;

FIG. 18 is a diagram illustrating a state in which unit ofrequest-response data (RR data) is set as a determination interval in anembodiment;

FIG. 19 is a diagram illustrating a state in which normal RR data aremerged and set as a normal interval, and abnormal RR data are merged andset as an abnormal interval;

FIG. 20 is a diagram illustrating a state in which an interval where RRdata of switch of the normal interval and the abnormal interval does notexist is treated as no data in an embodiment;

FIG. 21 is a diagram illustrating a state in which an interval where RRdata of switch of the normal interval and the abnormal interval does notexist is treated as no data in an embodiment;

FIG. 22 is a diagram illustrating a state in which an interval isswitched at a timing where next RR data of switch of the normal intervaland the abnormal interval appears in an embodiment;

FIG. 23 is a diagram illustrating a state in which an interval isswitched at an end timing of the last RR data of the same type of RRdata in an embodiment;

FIG. 24 is a diagram illustrating a state in which a switching isperformed at a middle point of a normal RR data group and an abnormal RRdata group;

FIG. 25 is a diagram illustrating a state of a case where RR data aresuperimposed in an embodiment;

FIG. 26 is a diagram illustrating a state in which an interval from thestart to the end of the same type of RR data is set as one normalinterval or abnormal interval in an embodiment;

FIG. 27 is a diagram illustrating a state in which an interval isseparated at a start timing (appearance timing) of different types ofnext RR data in an embodiment;

FIG. 28 is a diagram illustrating a state in which an interval isswitched is separated at an end timing of the previous type of the lastRR data before appearance of different types of RR data in anembodiment;

FIG. 29 is a diagram illustrating a state in which an interval is cut asa normal interval at the start of normal RR data, and an interval is cutat the end of normal RR data;

FIGS. 30A and 30B are comparative diagrams illustrating a state in whichRR data of a part of a function does not appear in different timings inan embodiment;

FIGS. 31A and 31B are diagrams illustrating a comparison between a casewhere only one RR data appears and a case where a plurality of RR dataappears, in different timings in an embodiment;

FIGS. 32A to 32C are diagrams schematically describing a concreteconflict and an implicit conflict according to an embodiment;

FIG. 33 is a diagram illustrating an example of a relationship betweenfunctions and components according to an embodiment;

FIG. 34 is a flowchart describing a generation of a supplementary table(exclusive point table) in an embodiment;

FIGS. 35A and 35B are diagrams illustrating an example of a pathinformation table and an exclusive point table according to anembodiment;

FIG. 36 is a supplementing flowchart according to an embodiment;

FIG. 37 is a diagram illustrating an example of a relationship betweenfunctions and components according to an embodiment;

FIG. 38 is a diagram illustrating an example of frequency information(table) according to an embodiment;

FIG. 39 is a flowchart describing an example of function selectionprocessing according to an embodiment;

FIG. 40 is a diagram illustrating an example of a relationship betweenfunctions and components according to an embodiment;

FIG. 41 is a diagram illustrating an example of frequency information(table) according to an embodiment; and

FIG. 42 is a diagram illustrating an example of frequency information(table) according to an embodiment.

DESCRIPTION OF EMBODIMENT(S)

Hereinafter, embodiments will be described with reference to thedrawings. However, the embodiments to be described below are merelyexemplary and are not intended to exclude a variety of unspecifiedmodifications or technical applications. Note that, in the drawingsreferred to in the following embodiments, parts assigned with the samereference numerals indicate the same or similar parts unless otherwiseparticularly mentioned.

FIG. 1 is a block diagram illustrating an example of a network systemaccording to an embodiment. The network system illustrated in FIG. 1,for example, includes a network 10 such as Internet or the like, servergroups 20, 30 and 40 connected to the network 10, and a network switch50. The server groups 20, 30 and 40, for example, include a web server30 or, an application (AP) server 40, other server 20, and the like.

The AP server 40, for example, includes a pre-analysis block 401, anoperation block 402, a user request database 403, and a path informationdatabase 404. Optionally, the AP server 40 may include an appearanceprobability database 405.

The AP server 40 includes a CPU, a memory, a storage device such as ahard disk device or the like, a display device, and a printer, which arenot illustrated in the drawings. The CPU implements a necessary functionunit by reading and operating a predetermined program from the memory orthe storage device. For example, the program includes an analysisprogram as an example of the program that implements the function of thepre-analysis block 401 or the operation block 402. The display device orthe printer, for example, may output the results of operations by theCPU. Note that, other server 20 or the web server 30 may include a CPU,a memory, a storage device such as a hard disk device or the like, adisplay device, and a printer as hardware devices.

The function of the analysis program (all or partial function of eachunit) is implemented in such a manner that the CPU or the like executesthe predetermined program.

The program, for example, is provided in a form of being stored in acomputer-readable recording medium such as a floppy disk, a CD-ROM, aCD-R, a CD-RW, an MO, a DVD, a blue-ray disk, a portable hard disk, aUSB memory, or the like. In this case, the computer uses the programthat is read from the recording medium, transmitted to an internalstorage device or an external storage device, and stored in the storagedevice. Also, the program, for example, may be stored in a storagedevice (recording medium) such as a magnetic disk, an optical disk, anoptical magnetic disk, or the like, and provided to the computer fromthe storage device via a communication line.

The computer as used herein is a concept encompassing hardware and anoperating system (OS), and refers to hardware that operates under thecontrol of the OS. Also, in a case where the OS is not required and anapplication program alone is designed to operate the hardware, thehardware itself corresponds to the computer. The hardware includes atleast a microprocessor such as the CPU or the like, and a unitconfigured to read the program stored in the recording medium.

The application program includes a program code executing the functionsof the analysis program in the computer as described above. Also, a partof the function may be implemented by not the application program butthe OS.

Also, the recording medium may use a variety of computer-readable mediasuch as an IC card, a ROM cartridge, a magnetic tape, a punch card, aninternal storage device (memory such as a RAM or a ROM) of the computer,an external storage device of the computer, or a printed matter with aprinted code such as a bar code, in addition to the floppy disk, theCD-ROM, the CD-R, the CD-RW, the DVD, the magnetic disk, and the opticalmagnetic disk as described above.

The user request database 403, the path information database 404, andthe appearance probability database 405, for example, are implemented inthe memory or the storage device of the AP server 40.

The pre-analysis block 401, for example, includes a pre-data collectionunit 410 and a path analysis unit 420.

The pre-data collection unit 410 inputs (transmits) data (request or thelike) of the user request database 403 to the network 10 as virtual userdata. Note that, the pre-data collection unit 410 may store an actualrequest, state, and the like of an actual operation and reproduce anoperational state of an actual operation.

The path analysis unit 420, for example, collects message data flowingthrough the respective servers 20, 30 and 40 as the result by the inputof the virtual user data, performs the path analysis, and stores theanalysis result in the path information database 404 as the pathinformation.

The operation block 402, for example, includes an operational datacollection unit 430, a function selection unit 440, a data slicing unit450, and a problem point identifying unit 460.

The operational data collection unit 430 collects, for example, uniformresource locator (URL)+common gateway interface (CGI) parameter or thelike from data flowing through the servers 20, 30 and 40 during theactual operation in the operation phase as, for example, log data. Notethat, in the actual operation, only information of a “front server” maybe collected. The “front server” refers to a server closest to the userside, which receives the request from the user, as compared with “allservers” in the pre-analysis phase. In the configuration illustrated inFIG. 1, the web server 30 may correspond to the “front server”. However,depending on the configuration, a load distribution server (loadbalancer; not illustrated) may correspond to the “front server”, and theAP server 40 may correspond to the “front server”.

The function selection unit 440 compares the collected log data with thepath information of the path information database 404, and perform thefunction selection (classification) of the log data.

The data slicing unit 450 performs processing of cutting a time intervalin which normal and abnormal states are not mixed in each selectedfunction (processing of calculating a state change timing). Details willbe described below.

The problem point identifying unit 460 performs a delay detection in thetime interval cut by the data slicing unit 450, and narrows oridentifies a problem point by comparison with the path information whenthe delay is detected.

The “functions” (or “processings”) as used herein are classified asfollows.

First, captured actual data is collected or data is collected byreproducing (replaying) test data by the pre-data collection unit 410,and the path analysis unit 420 classifies the path of each function ofthe system.

For example, as illustrated in FIG. 2, when p1 to p5 are assumed asnetwork components, message data flowing through the respectivecomponents p1 to p5 are analyzed, and the functions (Fi: i is a naturalnumber) are classified by the URL+CGI parameter. It can be seen that therespective functions pass through the following paths. Note that, thecomponents p1 to p5 may be processed as method unit or block unit ofprogram. The term “component” may be replaced with the term “module” or“check point”. Also, the “path” is positioned as a set of “components”.

F1=http://foo.com/appli1.cgi?flag=[|

1]exec path=p1-p2-p4-p5F2=http://foo.com/appli1.cgi?flag=calc path=p1-p3-p5F3=http://foo.com/appli1.cgi?data=true path=p1-p2F4=http://foo.com/appli2.cgi?feature=3 path=p3-p4

When F1 and F2 are delayed more than usual, the problem pointidentifying unit 460 may determine in view of the analyzed pathinformation that the paths (check points) p1, p2, p3, p4 and p5, throughwhich F1 and F2 pass, have problem (abnormality) probability.

Also, for example, it may be determined that there are no problems inthe common paths p1, p2, p3 and p4 of F1, F2 and F3 by informationindicating that F3 and F4 are not delayed and path information of F3 andF4. As a result, the remaining path p5 may be diagnosed as the cause ofthe delay.

Note that, when the analysis target is a program, p1 to p5 may beprocessed as method (function) call unit, block unit, log output pointunit designated by a user, or a combination thereof, as exemplarilydescribed below.

Method (Function) Call Unit

p1=method1( )→p2=method2( )→p4=method3( ) . . . .

Block Unit (Blocks Divided by if Statement or { })

p1=while( . . . )→p2=if( )→p4=else . . . .

Log Output Point Designated by User

p1={file=foo.java, line=35}→p2={file=foo.java,line=55}→p4={file=boo.java, line=20} etc.

As a simple example, as illustrated in FIG. 3A, the path information mayexpress the respective functions F1 to F4 and the check points p1 to p5in a matrix. Note that, the matrix expression is an example ofprocessing in the analysis phase.

As illustrated in FIG. 3B, the chick points of the deterioratedfunctions (F1 and F2 in the example of FIG. 2) are detected by a logicalsum (OR). Next, as illustrated in FIG. 3C, the check points of thenon-deteriorated functions (F3 and F4 in the example of FIG. 2) aredetected by OR.

Further, as illustrated in FIG. 3D, an exclusive logical sum (XOR) isperformed on the result of FIG. 3B and the result of FIG. 3C. Next, alogical product (AND) is performed on the result of FIG. 3B and theresult of FIG. 3D. In the present example, the result of AND isidentical to that of FIG. 3D. As illustrated in FIG. 3D, p5, in which“1” remains, may be identified as the problem point, based on the resultof AND.

(Analysis Phase)

As illustrated in FIG. 4, in the analysis phase (analysis block 401),two functions may be executed in parallel.

First, in the analysis block 401, the pre-data collection unit 410inputs a request message to the servers 20, 30 and 40 by reproducingrequest data prepared in advance in the user request database 403 (datareproduction: processing P10). The processing is repeated until apredetermined end condition is satisfied (until determined as Yes inprocessing P20) (No route of processing P20). Note that, as the requestdata, those collected in the actual operation, those generated as testdata, or the like may be used.

The pre-data collection unit 410 acquires data by capturing network datacalled by the data input in the data reproduction or by acquiring logdata of the servers 20, 30 and 40 (processing P30).

Next, in the analysis block 401, for example, the path analysis unit 420performs association processing on the acquired data and generates pathinformation (processing P40). An example of the association processingis illustrated in FIGS. 6 and 7.

As illustrated in FIG. 6, the path analysis unit 420 checks whether datato be associated exist (processing P410). When the data does not exist,the path analysis unit 420 waits until data appears (No route ofprocessing P410), and when the data exits, the path analysis unit 420selects a data type (application or database, and the like) (processingP420 in Yes route of processing P410).

Next, the path analysis unit 420 performs primary association processingon each selected type (processing P430). Further, the path analysis unit420 checks whether a transaction is ended (processing P440). When allconstituent data types of data are provided, it is determined as thetransaction end (Yes route of processing P440), and the path analysisunit 420 performs secondary association processing on all constituentdata types of data by using an identification key (processing P450).Note that, processings subsequent to processing P410 are repeated untilit is determined as the transaction end (No route of processing P440).

FIG. 7 illustrates an example of the primary association processing andthe secondary association processing. On the lower left side of FIG. 7,a data structure including a time stamp, a transaction ID, and otherinformation is illustrated as a data example of the application (AP).Meanwhile, on the lower right side of FIG. 7, a data structure includinga time stamp, a session ID, other information, and a transaction ID isillustrated as a data example of the database (DB).

The upper side of FIG. 7 illustrates a state in which data illustratedon the lower side of FIG. 7 are selected for each data type. Further, asillustrated on the upper side of FIG. 7, the data of the AP areprimarily bound by a unique selection key of the AP (for example, thetransaction ID (t01, t02, and the like), and the data of the DB areprimarily associated by a unique selection key of the DB (for example,the session ID (s35, s35, and the like).

Different types of data are secondarily associated by an identificationkey (for example, the transaction ID (t01, t02, and the like). Notethat, all data do not necessarily have identification keys that areneeded for the secondary association.

When the secondary association is completed, the path analysis unit 420registers (stores) the associated result (processing P460).

When such association processing is completed, the path analysis unit420 performs function extraction processing as illustrated in FIG. 4(processing P50). The function extraction processing is an example ofprocessing of extracting and classifying the functions from the aboveassociated result and the URL+CGI parameter.

The path analysis unit 420 registers the analysis result in the pathinformation database 404 as the path information (processing P60). Notethat, as described below, in order to improve the accuracy of theproblem point identification, a method using appearance probability(frequency) information may be considered. In this case, the pathanalysis unit 405 stores the appearance probability information in theappearance probability information database 405 (see FIG. 1).

(Operation Phase)

Next, an example of processing in the operation phase will be describedwith reference to FIG. 5.

In the operation phase (operation block 402), the operational datacollection unit 430 collects information such as the URL+CGI parameterand the response time among the actual operational data from the networkswitch 50 or the web server 30 (processing P100) (processing P100).

Next, in the operation block 402, the function selection unit 440selects function units from the collected data, based on the parameterssuch as URL, CGI, and the like (processing P110).

Further, in the operation block 402, the data slicing unit 450 performsfunction extraction processing, that is, processing of cutting a timeinterval in which normal and abnormal states are not mixed in eachselected function (processing of calculating a state change timing)(processing P120). Note that, when the selected function is not includedin the path information, it is applied to the function of the pathinformation.

Thereafter, the data slicing unit 450 registers (stores) the functionand response information in an analysis target data table (notillustrated) as aggregation information (processing P130). An example ofa registration form is illustrated in Table 1 below.

TABLE 1 Example of registration form of analysis data table Separation:Normal/Abnormal Function (Normal = 0, Abnormal = 1) Interval ID F1 1 12F2 0 12 F3 0 12 . . . . . . . . .

In the example of Table 1 above, an entry in which data appears in aninterval identified by the interval ID is registered. F3 represents thatno data has existed in that interval. Note that, the intervalinformation corresponding to the interval ID, for example, may bemanaged in other table (interval table) illustrated in Table 2 below. Alength of the interval may be different at each slice.

TABLE 2 Example of interval table Interval ID Start Time End Time 11123.333 123.555 12 123.580 134.222 . . . . . . . . .

Next, in the operation block 402, the problem point identifying unit 460determines whether the response is degraded (processing P140). Thedetermination may be performed in single response unit or aggregationunit.

When the response is not degraded, the operation block 402 repeats theprocessings subsequent to processing P100 (No route of processing P140).On the other hand, when the response is degraded (Yes in processingP140), the problem point identifying unit 460 performs the problem pointidentification by comparing the aggregation information and the pathinformation (processing P150).

When the problem point identification is possible (Yes route ofprocessing P160), the problem point identifying unit 460 outputs theinformation of the identified problem point on the display device or thelike (processing P170). At this time, when a plurality of candidatesexists, the plurality of candidates may be output after assigningpriorities thereto. However, the priorities may not be assigned.

An example of output data is illustrated in FIG. 8. An example of ananalysis result notification window 500 in the actual operation phase isillustrated on the left side of FIG. 8. For example, information such asa date and time of delay generation, an estimated delay point, and thelike is displayed on the notification window 500.

Herein, when wanting to know more information about the delay point, adetails display window 510 illustrated on the right side of FIG. 8 maybe displayed, for example, by selecting a details display button 501provided on the notification window 500. On the details display window510, details display buttons 511 to 515 may also be disposedcorresponding to the information to be displayed. When wanting to knowmore information on the details display window 510, more information maybe displayed by selecting the corresponding details display buttons 511to 515.

When the problem point identification is impossible (No route ofprocessing P160), the problem point identifying unit 460 outputs thedegradation defection on the display device or the like (processingP180). When a delay is detected, an example of a notification window 520is illustrated in FIG. 9. For example, information such as a date andtime of detection of delay generation, functions (URL or the like)having detected the delay generation, and the like is displayed on thenotification window 520

On the notification window 520, details display buttons 521 and 522 maybe disposed corresponding to the functions having detected the delaygeneration. More information, for example, an average of the responsetime, may be displayed by selecting the details display button 521 or522.

Next, a problem when normal and abnormal data are mixed will bedescribed with reference to FIGS. 10 and 11.

In FIGS. 10 and 11, “abnormal interval” illustrates “time interval ofabnormal data”, and “normal interval” illustrates “time interval ofnormal data”. The “abnormal data”, for examples, refers to datarepresenting that a response time is longer than a normal range, and the“normal data”, for example, refers to data representing that a responsetime is within a normal range.

Even in the same functions, normal data and abnormal data may be mixeddepending on timings. In that case, the aforementioned narrowing usingthe matrix cannot be performed.

For example, when a threshold value of the response time is 1 second(abnormal if equal to or more than 1 second, and normal if less than 1second), the analysis may not be accurate even if the average 1 secondis determined as abnormal (see, for example, an arrow 601 of FIG. 10).As such, when there is a problem caused by a dedicate matter of timing,a correct determination may not be made because the determination resultbecomes either normal or abnormal on average. Also, when response timesof the plurality of functions (F1, F2, . . . ) are all around thethreshold value, the analysis result may be never reliable.

Note that, since the technique of Patent Document 1 is a detection of amalfunction of a network equipment, a normal state and an abnormal stateare clearly divided (the mixture of normal/abnormal data is notconsidered).

Therefore, in the present embodiment, the narrowing can be enabled byautomatically cutting the region (time interval) in which the normal andabnormal states are not mixed.

As an example of the basic processing, first, a timing of a change ofthe normal and abnormal states is calculated by each URL, and a timeinterval in which the normal and abnormal states are not mixed isseparated by each URL, based on the corresponding timing. In a rangewhere each time interval is superimposed, a matrix is made and anoperation is performed (an abnormal module being a problem point iscalculated (detected), based on “relationship information” between theplurality of processings (or functions) and the modules).

Note that, the “relationship information” may be appropriately updated.For example, the request data in the actual operation phase is stored inthe user request database 403, and when unknown data having not appearedin the pre-analysis phase appears in the actual operation phase, the“relationship information” is updated by performing the pre-analysisagain by using the stored request data.

However, if the interval in which the normal and abnormal states are notmixed in one URL is cut by a plurality of URLs, the interval is cut intotoo small pieces and thus combinations (computation time) becomeenormous. Therefore, among processings exemplified in the following (a)to (c), the abnormal point narrowing is performed by only (a), (a)+(b),(a)+(c), or (a)+(b)+(c).

(a) A slice that does not include an abnormal state is excluded.

(b) An operation is performed by selecting a slice that covers morepoints (components) (for example, since the component used by the URL isalready known (analyzed), a slice including more components bycombination is selected. Candidates of the combination are prepared bypreviously calculating “can most components be included if whichcombination of URLs is covered”.

(c) A slice that covers more URLs is selected and an operation isperformed.

(Solution to Minimize Aggregation Interval)

Although wanting to make the corresponding operation applicable byadjusting the aggregation interval, effective data cannot be found bymerely shortening the aggregation interval. If the aggregation intervalis excessively shortened, functions (URLs) appearing at the same timeare reduced and thus the analysis is not effective. Also, if datasuitable for the analysis in various durations are found while changingthe duration, the combinations are exploded and an estimate of acomputation amount becomes impossible.

For example, as indicated by reference numeral 602 in FIG. 11, when theaggregation interval is shortened, data necessary for the determination(in this case, F1, F2, F3 and F4) are incomplete. Also, as indicated byreference numeral 603 in FIG. 11, when the aggregation interval is moreshortened and the search is performed while sliding the correspondingaggregation interval, the interval necessary for the analysis may befound by change according to timings. However, the combinations becomeinfinite and the computation time is lacking.

(Determination in Superimposed Manner by Separating Normal Interval andAbnormal Interval)

In the present embodiment, for example, as illustrated in FIG. 12, thenormal interval and the abnormal interval are divided at each function(for example, URL), and the superimposed region of the intervals is usedfor analysis. Therefore, analyzable data can be found with suppressing acomputation amount, and analysis accuracy is improved. Note that, inFIG. 12, the functions F1 and F4 are assumed that similar abnormal ornormal data exist before and after temporally. Further, FIG. 12illustrates a state in which the interval (determination interval) isdivided into two intervals by data of the function F3.

(Case Example in Business Processing System)

A problem occurring when a new service (airline ticketing system) of abusiness processing system is provided will be described with referenceto FIGS. 13 and 14.

FIG. 13 illustrates a state in which functions (F1, F2 and F3) and pathsare set as described below.

F1=pre-settlement path=p1 (travel expense)-p2 (settlement)-p4(DB1)

F2=post-settlement path=p1 (travel expense)-p3 (reservation inquiry)-p5(DB2)-p2 (settlement)-p4 (DB1)

F3=airline ticketing status path=p1 (travel expense)-p3 (reservationinquiry)-p5 (DB2)

There has been no problems at the beginning of the system operation, butslowdown of the system occurred after one month. The direct cause is theincrease in the load of the reservation inquiry (p3) by the airlineticketing status (F3) and the post-settlement (F2) because the search ofall cases is performed in the reservation inquiry (p3) and thereservation inquiry (p3) is performed regardless of the existence andnon-existence of the airline ticketing in the post-settlement (F3) ofthe travel expense.

Since an operator cannot imagine the increase in the load of the airlinereservation inquiry (p3) due to the post-settlement, it has taken a longtime to separate problems.

(Occurrence of Symptom in Business Processing System)

For example, as illustrated in FIG. 14, in a usual aggregation interval,since F1, F2 and F3 are classified into F1=normal, F2=normal, andF3=abnormal, the analysis is not correctly performed. If F1=normal,F2=abnormal, and F3=abnormal, the determination is possible.

Diagnosis by the Present Embodiment

Advance Preparation

First, the path analysis unit 420 (see FIG. 1) of the pre-analysis block401 classifies the businesses and/or functions by URL (+argument) (F1 toF3), and sets path information at each classified business and/orfunction (processings P211 and P212 of FIG. 16). For example, asdescribed below, the components p1 to p5 are set at each of thefunctions F1 to F3.

F1=http://foo/ . . . pre-settlement:p1-p2-p4

F2=http://boo/ . . . post-settlement:p1-p2-p3-p4-p5

F3=http://bar/ . . . airline ticketing status:p1-p3-p5

Overview of Diagnosis

In a case where F1 is normal and F2 and F3 are delay, abnormalcomponents are diagnosed. In a case where F2 and F3 are abnormal, it maybe determined from the path information of F2 and F3 that there is aprobability that p1, p2, p3, p4 and p5 (that is, all components in thecase of the present example) are abnormal. Herein, since F1 is normal,the probability that p1, p2 and p4 are abnormal from the pathinformation of F1 is excluded.

As a result, p3 (reservation inquiry) and p5 (DB2) are diagnosed as thecause of the delay. Note that, with respect to the abnormal componentsprimarily separated by the diagnosis, a prompt attention is enabled byautomatically performing additional monitoring or analysis.

FIG. 17 illustrates an example of processing flow in the actualoperation phase.

First, the data slicing unit 450 classifies the normal interval and theabnormal interval at each path (processing P221), and generates slicesof all intervals in a range where the normal interval and the abnormalinterval are not mixed at each path (processing P222).

Next, the problem point identifying unit 460 (see FIG. 1) processes theslices in sequence (processing P223). The problem point identifying unit460 checks whether a next slice exists (processing P224). When the nextslice exists (Yes in processing P224), the problem point identifyingunit 460 determines whether an abnormal interval exists in thecorresponding slice (processing P225). When the abnormal interval exists(Yes in processing P225), the problem point identifying unit 460 selectsa slice having a high component coverage among the slices including theabnormal interval (processing P226), and narrows the abnormal point(processing P227).

The problem point identifying unit 460 updates a narrowing degree andrecords a more narrowed slice (processing P228). Next, the problem pointidentifying unit 460 determines whether the abnormal point can beidentified (processing P229). When the abnormal point can be identified(Yes in processing P229), the problem point identifying unit 460performs notification processing for example, by displaying informationof the identified abnormal point on the display device or the like(processing P230).

Note that, when the abnormal interval is not included in the slice (Noin processing P225) or when the abnormal point cannot be identified (Noin processing P229), all processings proceed to processing P223.Meanwhile, when the next slice does not exist (No in processing P224),the notification processing is performed.

(Application to Business Processing System)

For example, as illustrated in FIG. 15, the normal interval and theabnormal interval are classified, and the determination is performed oneach function interval in a superimposed manner at each determinationinterval. In the case of FIG. 15, “determination interval 1”=“normal,normal, abnormal”, “determination interval 2”=“normal, abnormal,abnormal”, and “determination interval 3”=“normal, normal, abnormal”. Inthis case, by the analysis on the region (range) of the “determinationinterval 2”, p3 (reservation inquiry) and p5 (DB2) are narrowed as theproblem points.

(Method of Classifying Normal Interval and Abnormal Interval)

A sparse case where the normal interval and the abnormal interval existin a sparse manner and a superimposed case where the normal interval andthe abnormal interval exist in a superimposing manner may be considered.

(Sparse Case)

In the sparse case, classification by the following methods may beconsidered.

(Method 1) A request-response data (hereinafter, referred to as “RRdata”) unit is set as the determination interval (see FIG. 18). In otherwords, interval of RR data=normal interval or abnormal interval. Notethat, in FIG. 18, data of the normal interval or the abnormal intervalindicated by a rectangle corresponds to RR data.

(Method 2) The normal RR data are merged and set as the normal interval,and the abnormal RR data are merged and set as the abnormal interval(see FIG. 19). As compared with the method 1, the number of theintervals can be suppressed, and thus, the processing time can bereduced. In FIG. 19, several methods of determining into which one theRR data non-existence interval of the switch of the normal interval andthe abnormal interval is incorporated may also be considered (dependingon the setting).

(Method 2-1) The RR data non-existence interval of the switch of thenormal interval and the abnormal interval is neither normal nor abnormaland is treated as “no data” (see FIG. 20). The present method 2-1 isused when wanting to find the normal/abnormal intervals strictly.

(Method 2-1′) Like the above method 2-1, the RR data non-existenceinterval exceeding the threshold values of the normal interval and theabnormal interval is treated as “no data” (see FIG. 21). The thresholdvalue in the case of being treated as “no data” may use an average valueof the normal/abnormal RR data, or may use a threshold time determiningas normal/abnormal.

(Method 2-2) The interval may be switched at a timing where next RR dataof the switch (different type (normal/abnormal)) of the normal intervaland the abnormal interval appears (see FIG. 22).

(Method 2-3) The interval is switched at an end timing of the last RRdata of the same type (same normal/abnormal type) of RR data (see FIG.23).

(Method 2-4) The interval is switched at a middle point of the normal RRdata group and the abnormal RR data group (see FIG. 24). Note that, themiddle point is a non-limiting example and may be a middle of the datanon-existence interval or a point separated by an average value of thenormal RR data.

Basically, the method 2-1 or the method 2-1′ is used, and a case wherethe RR data non-existence interval is long may be treated as “no data”.This is because a correct result is not obtained even whenidentification processing is performed using the matrix based onambiguous information (even when data does not exist, it is treated asnormal). However, in a case where RR data are too small and intervalinformation necessary for analysis is incomplete, identificationprocessing may be performed at the expense of accuracy, for example, byloosening the threshold value.

(Superimposed Case)

When the RR data are superimposed as illustrated in FIG. 25, theinterval from the start to end of the same type of the RR data isbasically set as one normal interval or abnormal interval as illustratedin FIG. 26.

(Method 1) The interval is separated at a start timing (appearancetiming) of different types (normal/abnormal) of next RR data (see FIG.27). In typical cases, a delay is generated in one processing by acertain cause (for example, lock of a DB), and another processing iswaited by the processing. Hence, likewise, a delay is also generated inanother processing. The present method 1 is based on the assumption thatwhen the cause of delay of the basic processing is solved, the otherprocessings are immediately ended, and subsequent RR data are normal.

(Method 2) The interval is separated at an end timing of a previous typeof the last RR data upon appearance of different types (normal/abnormal)of RR data (see FIG. 28).

(Method 3) The interval is separated as the normal interval at the startof the normal RR data, and the interval is separated at the end of thenormal RR data (see FIG. 29). Usually, the method 3 may be used. Thereason for separating the interval at the end of the normal RR data isthat which portion is abnormal cannot be known, but the end of thenormal RR data is an evidence that a portion until the end is normal.The reason for separating the interval at the start of the normal RRdata is that the start of the normal RR data is an evidence that aportion from the start is normal.

(Variation)

A timing covering components as many as possible may be found. This isbecause as more components appear, the narrowing degree is high. Also, atiming where functions (for example, URL type) are gathered as many aspossible may be found. This is because as there are more patterns, thenarrowing is easier.

For example, at a timing A illustrated in FIG. 30A, the RR data of somefunction (F2) does not appear. However, at a timing illustrated in FIG.30B, the RR data of all functions (F1, F2, F3) appear. In this case, theRR data of the timing B instead of the timing A may be used fordetermination.

Further, it may wait until a plurality of RR data of the same function(for example, URL) appears. This is because just one may be a chance.For example, at the timing A illustrated in FIG. 31A, only one RR dataof each function F1, F2 and F3 appears. However, at the timing Billustrated in FIG. 31B, a plurality of RR data of each function F1, F2and F3 appears. In this case, the RR data of the timing B instead of thetiming A may be used.

(Analysis Apparatus Notifying Point Having Conflict Probability)

As schematically illustrated in FIG. 32A, the RR data temporallysuperimposed with the delay RR data is separated, and the problem pointis narrowed in the separated range. This is based on the idea that theuse of statistic values alone cannot detect the occurrence ofinstantaneous conflict.

(Detection of Concrete Conflict)

It is notified that the narrowing is actually possible as the problempoint. FIG. 32B is an example in which p5 is a concrete conflict point.

(Detection of Implicit Conflict)

The point that does not appear as a common problem point but generatesthe same problem at a high probability if a problem occurs is notifiedas an implicit conflict (it should not conflict but it conflicts withsomething in the back). This corresponds to a combination of ashort-term analysis and a long-term analysis in a certain sense. FIG.32C illustrates an example in which p2 and p3 are the implicit conflictpoints.

(Notification of Conflict-Possible Point)

The point is notified as the conflict-possible point, including theconcrete conflict and/or the implicit conflict. The accuracy may beranked from the narrowing degree and the simultaneous generationprobability.

(Accuracy is Improved by Supplement with Information Upon Analysis)

When the narrowing is impossible in the information of the analysisphase, a point that “the identification is possible if no problem (ordeterioration) is proved by this checkpoint” is extracted. For example,in FIG. 33, when wanting to identify which one of p4 and p5 is thecause, a request passing through the point is input from the data usedin the pre-analysis phase. It is more efficient if preparing for asupplementary table (index) that extracts a “candidate request” from thepoint.

A flow of generating the supplementary table is illustrated in FIG. 34.

For example, the path analysis unit 420 (see FIG. 1) scans all points(p1, p2, p3, p4, p5) included in the path information (see, for example,FIG. 35A) in the path information database 404 (processing P311), andchecks whether a point exists (processing P312).

As the checking result, when the point exists (Yes in processing P312),the path analysis unit 420 extracts all function IDs passing through thecurrently targeted point (key point) (x) (processing P313). For example,in FIGS. 33 and 35A, when the key point is p4, the functions F1 and F3pass, and thus, the functions F1 and F3 are extracted. Also, when thekey point is p1, the functions F1, F2, F2, F3 and F4 pass, and thus, thefunctions F1, F2, F2, F3 and F4 are extracted.

Next, the path analysis unit 420 extracts all points (Y) used by theextracted function ID group (processing P314). For example, when theextracted functions are F1 and F3, p1, p2, p3, p4 and p5 are extracted.Also, when the extracted functions are F1, F2, F2, F3 and F4, p1, p2, p4and p5 are extracted.

When there is a point (exclusive point) (z) not passing throughself-function (a) in a point combination (x)-(Y) for each function ID(a), the path analysis unit 420 outputs a combination with (x) to thetable (processing P315), and returns to processing P311.

For example, in the function F1, all pass through (Y)=p1, p2, p4 and p5.In the function F3, p5 does not pass through (Y)=p1, p2, p4 and p5. Inthis case, the path analysis unit 420 outputs the record of p4, p5 andF3 to the table. The corresponding record means that the function F3passes through p4 but does not pass through p5 (see FIG. 35B).

Also, the function F1 passes through all points. The function F2 doesnot pass through the points p4 and p5. Therefore, the path analysis unit420 outputs the record of (p1, p4, F2) and (p1, p5, F2) to the table.Also, since the function F3 does not pass through the point p5, the pathanalysis unit 420 outputs the record of p1, p5, (F2), and F3 to thetable. Also, since the function F4 does not pass through the point p4,the path analysis unit 420 outputs the record of p1, p4, (F2), and F4 tothe table.

In this manner, with respect to the path information illustrated in FIG.35A, the supplementary table (exclusive point table) illustrated in FIG.35B is generated. Note that, in processing P312, when the point does notexist (No route in processing P312), the path analysis unit 420 ends theprocessing.

Note that, it is more efficient if preparing for a table (index) suchthat the “candidate request” can be extracted by narrowing. A flow ofsupplementing data when data is lacking is illustrated in FIG. 36.

The path analysis unit 420 performs analysis (processing P321), andchecks whether a plurality of candidates exists (processing P322). Forexample, in the actual operation phase, p4 and p5 become delaycandidates when data in which the function F1 is abnormal and thefunction F2 is normal exists and data regarding the functions F3 and F4does not exist.

When the plurality of candidates exists (Yes in processing P322), thepath analysis unit 420 divides points of the candidates (processingP323). For example, when the candidates are p4 and p5, the points of thecandidates are divided into p4 and p5.

Next, the path analysis unit 420 searches the exclusive point table(see, for example, FIG. 35B) in all combinations of the divided points(processing P324). For example, the function F3 is found if searchingthe exclusive point table illustrated in FIG. 35B with a search key=p4and an exclusive point key=p5. Also, the function F4 is found ifsearching the exclusive point table illustrated in FIG. 35B with asearch key=p5 and an exclusive point key=p4.

The path analysis unit 420 checks whether the exclusive point exists(processing P325). When the exclusive point exists (Yes in processingP325), reanalysis is performed by searching a found function group fromdata of the pre-analysis phase and re-inputting the searched functiongroup (processing P326). For example, data corresponding to thefunctions F3 and F4 found in processings P324 and P325 are re-input andanalyzed.

By re-inputting the “candidate request”, the information of a deficienttargeted check point can be supplemented, and the problem point can benarrowed (identified). For example, if there is no problem byre-inputting the request corresponding to the function F3, it may bedetermined (identified) that the cause of deterioration is p5.

Note that, when the narrowing (identification) cannot be performed,other “candidate request” may be used. For example, if the deteriorationis caused by re-inputting the request corresponding to the function F4,it may be determined that p4 is suspected as the cause of deterioration.The reliability may be increased by re-inputting a plurality ofrequests.

(Accuracy Improvement 1 Using Appearance Probability)

In order to improve the accuracy of the problem point identification, amethod using an appearance probability (frequency) may be considered.

(Pre-Analysis Phase)

For example, as illustrated in FIG. 37, when passing through two typesof paths, F1=p1-p2-p3 and F1=p1-p2-p3-p4, it is impossible to identifywhich one of the two paths the function F1 passes through, from externalinformation such as parameter of F1.

Herein, the path of p1-p2-p3 is set as F1-1, and the path of p1-p2-p3-p4is set as F1-2. The parameter of F1 alone cannot classify which one ofF1-1 and F1-2 the function F1 passes through, but can identify whichpath the function F1 passes through in the pre-analysis phase. Thus, thepath analysis unit 420 counts each frequency. As a result, theappearance probability of F1, for example, may be prepared as follows:F1-1 is 70% and F1-2 is 30%.

(Actual Operation Phase)

The information of the actual operation phase alone can know that thefunction is F1 by the parameter, but cannot identify whether the path isthe F1-1 path or the F1-2 path. In a case where F1 has good response ata probability of 70% and has bad response at a probability of 30%, itmay be estimated by the problem point identifying unit 460 that thepoint of p4, which is a difference between F1-1 and F1-2, is the causeof deterioration.

(Flow Using Appearance Probability)

In processing P60 of the flow in the pre-analysis phase illustrated inFIG. 4, the path analysis unit 420, for example, registers the frequencyinformation (table) in the appearance probability information database405 as illustrated in FIG. 38.

As illustrated in FIG. 39, the path analysis unit 420 associates thedata with the function (processing P331).

For example, the data and the function are associated as follows: “data1: F1=◯”, “data 2: F1=◯”, “data 3: F2=◯”, “data 4: F3=x”, and “data 5:F1=x”.

Next, the path analysis unit 420 assembles a data group of functionshaving a plurality of paths (processing P332). For example, it can beknown from the frequency information table illustrated in FIG. 38 thatthe function F1 has a plurality of path information. Thus, three datagroups, “data 1: F1=◯”, “data 2: F1=◯”, and “data 5: F1=x”, areassembled.

Further, the path analysis unit 420 calculates a normal to abnormalratio with respect to data where a plurality of paths exists in onefunction (processing P333). In the case of the above-described example,66.7% is normal and 33.3% is abnormal.

The path analysis unit 420 checks whether it can be considered that thenormal to abnormal ratio of data is equal to the frequency information(processing P334). In the case of the above-described example, since66.7% is normal and 33.3% is abnormal, it can be considered as equal toeach other. When considered as equal to each other (Yes in processingP334), the path analysis unit 420 associates the frequency informationwith appropriate path information (processing P335). On the other hand,when not considered as equal to each other (No in processing P334), thepath analysis unit 420 treats a high-frequency path as representativedata (processing P336).

(Accuracy Improvement 2 Using Appearance Probability)

As illustrated in FIG. 40, the path of the function F1 has two types,that is, F1(F1-1)=p1-p2-p4-p5 and F1(F1-2)=p1-p3-p5, the path of thefunction F2 is F2=p1-p3-p5, and the path of the function F3 isF3=p1-p2-p3. In this case, a plurality of paths that cannot beclassified by the parameter or the like exists in the function F1.

The pre-data collection unit 410 reproduces the request data stored inthe user request database 403, and the path analysis unit 420 counts afrequency of the request data passing through each function asillustrated in FIG. 41 (counts a frequency at each Fi and pi).

In the actual operation phase, the function selection unit 440 countsthe appearance frequency of each check point (pi) (see FIG. 42).However, details information regarding the function F1 is not checkedbecause the log collection amount or the throughput by the associationprocessing increases.

In FIG. 20, it is assumed that F1 and F2 are deteriorated and F3 isnormal. In this case, the possibility of cause of deterioration remainsin p4 and p5. Among all requests during the aggregation interval of thefunction F1 of the actual operation phase, for example, the request of28% (=14/50) is assumed as deteriorated. In this case, as compared withthe frequency information table illustrated in FIG. 41, the path of F1-2(p1-p3-p5) is estimated as deteriorated. Hence, it can be seen that p4(F1-1) is not deteriorated. As a result, p5 can be determined as thecause point.

According to one aspect, analyzable data can be found with suppressing acomputation amount, which in turn improves analysis accuracy.

All examples and conditional language recited herein are intended forthe pedagogical purposes of aiding the reader in understanding theinvention and the concepts contributed by the inventor to further theart, and are not to be construed limitations to such specificallyrecited examples and conditions, nor does the organization of suchexamples in the specification relate to a showing of the superiority andinferiority of the invention. Although one or more embodiments of thepresent inventions have been described in detail, it should beunderstood that the various changes, substitutions, and alterationscould be made hereto without departing from the spirit and scope of theinvention.

What is claimed is:
 1. An analysis method comprising: calculating normaland abnormal states from log data with respect to each of a plurality ofprocessings in which shared modules exist; calculating a timing of achange of the states; separating a time interval, in which the normaland abnormal states are not mixed, with respect to each of the pluralityof processings, based on the calculated timing; and detecting anabnormal module in the time interval, based on relationship informationbetween the plurality of processings and the modules.
 2. The analysismethod according to claim 1, wherein a start timing of a different typeof a state is an end timing of the time interval.
 3. The analysis methodaccording to claim 1, wherein a last end timing of a same type of astate is an end timing of the time interval.
 4. The analysis methodaccording to claim 1, wherein an end timing of a type of a state havingappeared before a start timing of a different type of a state is an endtiming of the time interval.
 5. The analysis method according to claim1, wherein when necessary data enough to detect the abnormal module doesnot exist, data is supplemented by re-inputting necessary data from astored user request.
 6. The analysis method according to claim 1,wherein when one of the processings passes through a plurality of pathsbeing a set of the modules, the detection of the abnormal module isperformed by identifying the paths using appearance probability.
 7. Ananalysis apparatus equipped with a computer, wherein the computer isconfigured to: calculate normal and abnormal states from log data withrespect to each of a plurality of processings in which shared modulesexist; calculate a timing of a change of the states; separate a timeinterval, in which the normal and abnormal states are not mixed, withrespect to each of the plurality of processings, based on the calculatedtiming; and detect an abnormal module in the time interval, based onrelationship information between the plurality of processings and themodules.
 8. A non-transitory computer-readable recording medium storingan analysis program to cause a computer to execute: calculating normaland abnormal states from log data with respect to each of a plurality ofprocessings in which shared modules exist; calculating a timing of achange of the states; separating a time interval, in which the normaland abnormal states are not mixed, with respect to each of the pluralityof processings, based on the calculated timing; and detecting anabnormal module in the time interval, based on relationship informationbetween the plurality of processings and the modules.